The exemplary embodiment relates to the linguistic arts. It finds particular application in conjunction with automated natural language processing for use in diverse applications such as information extraction and retrieval, grammar checkers for word processors, document content analyzers, and so forth, and will be described with particular reference thereto. However, it is to be appreciated that the exemplary embodiment is also amenable to other like applications.
Information retrieval tools are widely known which select text passages matching user criteria according to key words input by the user. These tools typically retrieve all available text passages that contain the key words. To improve retrieval of responsive text, natural language processing (NLP) systems have been developed which identify grammatical relationships (syntactic dependencies) between words or phrases in the text. The grammatical relationships are computed by first identifying the parts of speech associated with each word of a sentence. Some words may be associated with two or more parts of speech. For example, “fly” can be a noun or a verb or may form a part of a compound noun, such as “fly wheel.” By applying disambiguation rules, the most likely part of speech can be associated with the word, based on its context. Once the parts of speech are identified, grammatical relationships between the words and between groups of words are computed.
When titles of works of art, such as titles of books, movies, plays, and paintings, appear in natural language text, the computed parts of speech often yield inappropriate grammatical relationships. This is because the expressions used to identify the titles of works of art tends to disturb the syntactic order of sentences. As a result, the accuracy of NLP tools can be degraded. For example, consider the sentence: The album featured the singles “Girls Keep Singing”, “DJ” and “Look Back”. Natural language processing of this sentence may identify Girls, Singing, and DJ, respectively as noun, verb, and adjective and create a dependency between Keep and DJ in which DJ is identified as the object of the verb Keep. Clearly, this analysis is inappropriate, because the chunking of Singing, DJ and Look, coordinated noun phrases is wrong.